-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[Gh540] Add ALPpd encoding to parquet #48345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
ed922b2 to
4c50497
Compare
4c50497 to
48fd8fc
Compare
1b78a5c to
d563ce0
Compare
Also ensure that no line exceeds 90 characters
This reverts commit e85658b42b5373ef7e54295b100d1f083d55dd8d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more standard place to put test data is in either arrow-testing or parquet-testing so it can be used across implementations
In this case I would recommend https://github.com/apache/parquet-testing
| DELTA_BYTE_ARRAY = 7, | ||
| RLE_DICTIONARY = 8, | ||
| BYTE_STREAM_SPLIT = 9, | ||
| ALP = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
|
Thanks @prtkgaur -- it is super exciting to see this movement. Unfortunately, I am not familiar with the C/C++ codebase to give this a realistic review. I started the CI checks on this PR and had some comments about the testing. |
Co-authored-by: [email protected]
Rationale for this change
ALP significantly improves on the compression ratio and decompression speed over of float/double columns over other encoding/compression techniques.
What changes are included in this PR?
This PR
Introduces ALP (pseudo-decimal) encoding into c++ arrow code.
Adding above needed us to add
Are these changes tested?
Are there any user-facing changes?